This project aims to identify and explain the differences between the annotations of cells obtained by the Seurat and SingleR packages. was chosen for analysis. Immune cells were isolated from brain tissues removed during brain surgery for the treatment of epilepsy in this investigation.

Seurat analysis

Quality control (QC)

Feature - count relationship before QC

Filter dying and low-quality cells

Dying cells have a lot of percent of mitochondrial genes.

Feature - count relationship after QC

Cell were deleted, which have:

  • percent of mitochondrial genes more 20% (dying cells)

  • less 300 genes (low-quality cells or empty droplets)

  • more 5000 genes (cell multiplets or doublets)

After quality control, a total of 85,066 cells and 19,774 genes were retained for further analysis.

Feature selection

For further analysis, two thousand highly variable genes were selected, for which the data were preliminarily normalized using method “LogNormalize”.

PCA

PCA was used as dimensionality reduction technique. Before PCA data was scaled.

Dimensionality definition

Based on the results of JackStrawPlot, ElbowPlot and DimHeatmap were selected 1-20 PCs were selected for clustering.

Clustering

Dataset was clustered using KNN-graph approach. Clustering was performed with a resolution of 0.1 to 1 in increments of 0.1. An annotation was performed after each iteration, resulting in the best value of the resolution parameter of 0.5. Clusters were visualized by tSNE.

Cluster biomarkers

Biomarkers were found using next parameters:

  • the feature must be detected at 25% of cells in either of the two groups of cells

  • only positive

Firstly, markers from each clusters were analysed using database to make assumptions about the cell type. Then cell clusters were annotated using established marker genes.

Markers of microglia

Microglia are the primary immune cells of the CNS, and are highly similar to peripheral macrophages.

Clusters - 0, 1, 2, 3, 4, 6, 7, 8, 10 - can be annotated as microglia

T-cells marker

Cluster - 5 - can be annotated as T-cells.

NK-cells markers

Cluster - 12 - can be annotated as NK-cells.

B-cells markers

Cluster - 14 - can be annotated as B-cells.

Oligodendrocytes markers

Cluster - 13 - can be annotated as Oligodendrocytes.

Fibroblasts markers

Cluster - 9, 15, 17 - can be annotated as Fibroblasts.

Endothelial cells markers

Cluster - 16 - can be annotated as Endothelial cells

Dendritic cells markers

Cluster - 11 - can be annotated as Dendritic cells

Cell types

Seurat annotation result:

SingleR

Preprocessing data was the same.

Annotation each cell

Reference - HumanPrimaryCellAtlasData

The Human Primary Cell Atlas was chosen as the reference. Because most of the labels refer to blood subpopulations, but cell types from other tissues are also available. Our samples are isolated immune cells, but from the brain, so there may be other cell types available here.

Results are shown that our samples have a lot of unexpected cell types. For example, hepatocytes, erythroblast etc.

The manual annotation contained many cells belonging to microglia. The reason we do not see the same result here is that the reference we selected does not contain information about this cell type, as well as many other brain cells. However, the use of this reference is still important for the evaluation of immune cells, since datasets containing brain cells usually do not contain their.

Annotation diagnostic:

Heatmap of the assignment scores

Reference - Darmanis brain data

The Darmanis brain data were chosen as a reference to brain cells that were not presented in the previous data.

Results is only 6 type of cells but we can see that a lot of cells was annotated as microglia what is more likely to be true. But we can’t analyse immune cells.

Annotation diagnostic:

Heatmap of the assignment scores

Reference - DarmanisBrainData and HumanPrimaryCellAtlasData

It was decided to combine the two references to solve problem with annotation.

Annotation diagnostic:

Heatmap of the assignment scores

Comparison of annotations

Here I compare annotations obtained with SingleR and annotations made manually.

In general, clusters consisting of many cells are annotated in the same way. However, in some cases SingleR refers parts of the cells to a different type, which is caused by the fact that the annotation is performed for each individual cell independently. I speculate that this difference might be smaller if the parameters for clustering in Seurat were chosen more optimally. Insufficient clustering may have resulted in not enough resolution to identify rare cell types.

It is worth noting that the difference between annotations of small clusters is more significant. I think this can be explained by the fact that manual annotation often uses established marker genes. First, the choice of markers is subjective, because, as a rule, not all possible markers are checked, but those that are of interest to the researcher. Second, the very assessment of the expression of these markers is also subjective. In turn, SingleR uses correlation analysis only for variable genes and compares the expression profile with the reference. In my opinion, the main advantages of this method are that it is more accurate and less subjective.

Cluster annotation

Reference - DarmanisBrainData and HumanPrimaryCellAtlasData

SingleR provides the ability to annotate clusters at once.

Data from two references were taken as a reference, since this showed the most meaningful biological results in the annotation of individual cells.

Annotation diagnostic:

Heatmap of the assignment scores

Conclusion

An independent annotation of each cell provides more information, as it allows you to identify transient types and rare cell types. However, the quality of the annotation will strongly depend on the reference you choose. In addition, it is not possible to identify a new cell type because it will not be in the reference. As for manual annotation, it is a creative process. Its quality is largely up to you. However, it takes a lot of time and, unfortunately, is usually subjective.